Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick & dirty get endpoints for chunks #14

Merged
merged 1 commit into from
Jan 26, 2024
Merged

Quick & dirty get endpoints for chunks #14

merged 1 commit into from
Jan 26, 2024

Conversation

kerinin
Copy link
Contributor

@kerinin kerinin commented Jan 24, 2024

This makes some changes to the chunk data model to make it easier to return as a resource - I expect this will evolve further as we start actually populating the DB with chunks.

Chunk = Annotated[Union[TextChunk, ImageChunk], Field(discriminator='kind')]

class RetrieveResult(BaseModel):
score: Optional[float] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit unnecessary in terms of nesting, which is what I was originlaly trying to avoid by putting the score in the chunks.

Specifically, now a user needs to do results.map(chunk: chunk.chunk.text) to get to the text.

Colud we do a middle ground? I think the other improvement (indicating the discriminator) is helpful -- so we could keep BaseChunk and still split out the Chunk = Annotated[Union[...], Field(discriminator='kind')] and then use Sequence[Chunk] in the response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I just ran into that updating the demo app. I'll go back to using a base type and have a per-chunktype field for extra options

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging into this more, I think this will need to change once the chunk/embedding data model is in place - we'll be searching for (and scoring) embeddings, and then mapping those back to chunks. I'm going to patch this up a bit for now.

dewy/chunks/models.py Show resolved Hide resolved
"""The similarity score of this chunk."""

text: Optional[str] = Field(..., description="Textual description of the chunk.")

metadata: Union[TextChunk, ImageChunk] = Field(..., discriminator='kind')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still feels oddly nested -- to get the content of the text chunk, you need to go through metadata. I think what we had here previously was cleaner -- putting things like the score into the chunk so that it is flatter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was leftover - ignore.


score: Optional[float] = None

class RetrieveResult(BaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is RetrieveResult used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doh, orphan code. removing

dewy/chunks/models.py Show resolved Hide resolved
document_id: int
kind: Literal["image"] = "image"

image: Optional[str] = Field(..., description="Image of the node.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely some kind of location int he document? start/end index?

This makes some changes to the chunk data model to make it easier to
return as a resource - I expect this will evolve further as we start
actually populating the DB with chunks.
@kerinin kerinin merged commit f7027d9 into main Jan 26, 2024
@bjchambers bjchambers deleted the rm/get-chunks branch January 29, 2024 16:33
@bjchambers bjchambers added the enhancement New feature or request label Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants